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(57) Abstract: The present invention relates to a gesture recognition system comprising: means for detecting and generating a signal 
corresponding a number of markers arranged on an object, means for processing said signal from said detecting means, means for 
detecting position of said markers in said signal. The markers are divided into first and second set of markers, said first set of markers 
constituting a reference position and that said system comprises means for detecting movement of said second set of markers and 
generating a signal as a valid movement with respect to said reference position. 
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GESTURE RECOGNITION SYSTEM 

The present invention relates to a gesture recognition system comprising: means for detecting 
5 and generating a signal corresponding a number of markers arranged on a body or part of 
body of a person, means for processing said signal from said detecting means, means for 
detecting position of said markers in said signal. 

The rapid development of the motion analyses systems and computer-controlled devices has 
10 introduced possibility of new ways of interacting with computers. One preferred method is to 
use different body parts as a commanding device, i.e. using movements to enter commands 
into the operating system of the computer or control peripheral devices. 

However, the known methods and systems are rather complex and usually analyse the entire 
15 body or a large area. 

In WO 99/34276, a system and method for constructing three-dimensional images using 
camera-based gesture inputs of a system user is described. The system comprises a computer- 
readable memory, a video camera for generating video signals indicative of the gestures of the 

20 system user and an interaction area surrounding the system user, and a video image display. 
The video image display is positioned in front of the system users. The system further 
comprises a microprocessor for processing the video signals, in accordance with a program 
stored in the computer-readable memory, to determine the three-dimensional positions of the 
body and principle body parts of the system user. The microprocessor constructs three- 

25 dimensional images of the system user and interaction area on the video image display based 
upon the three-dimensional positions of the body and principle body parts of the system user. 
The video image display shows three-dimensional graphical objects superimposed to appear 
as if they occupy the interaction area, and movement by the system user causes apparent 
movement of the superimposed, three-dimensional objects displayed on the video image 

30 display. 



WO 01/69365 PCT/SE01/00528 

2 

According to US 6,002,808, a system is provided for rapidly recognizing hand gestures for the 
control of computer graphics, in which image moment calculations are utilized to determine 
an overall equivalent rectangle corresponding to hand position, orientation and size, with size 
in one embodiment correlating to the width of the hand. In a further embodiment, a hole 
generated through the utilization of the touching of the forefinger with the thumb provides a 
special trigger gesture recognized through the corresponding hole in the binary representation 
of the hand. In a further embodiment, image moments of images of other objects are detected 
for controlling or directing onscreen images. 

A method and an apparatus are described in US 5,576,727 for use with a computer for 
providing commands to a computer through tracked manual gestures and for providing 
feedback to the user through forces applied to the interface. A user manipulatable object is 
coupled to a mechanical linkage, which is, in turn, supportable on a fixed surface. The 
mechanical linkage or the user manipulatable object is tracked by sensors for sensing the 
location and/or orientation of the object. A multi-processor system architecture is disclosed 
wherein a host computer system is interfaced with a dedicated microprocessor which is 
responsive to the output of the sensors and provides the host computer with information 
derived from the sensors. The host computer has an application program, which responds to 
the information provided via the microprocessor and which can provide force-feedback 
commands back to the microprocessor. The force feedback is felt by a user via the user 
manipulatable object. 

Thus, the main object of the present invention is to provide a gesture recognition method and 
system, which allow real time gesture recognition without a need for complex computing 
resources. 

Therefore, in the initially mentioned system, the markers are divided into first and second set 
of markers, said first set of markers constituting a reference position and that said system 
comprises means for detecting movement of said second set of markers and generating a 
signal as a valid movement with respect to said reference position. 
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In the following, the invention will be further described in a non-limiting way with reference 
to the accompanying drawings, in which: 

Fig. 1 is a block diagram illustrating the main parts of the system according to the 
invention, 

Fig. 2 is a block diagram illustrating the main parts of the marker tracking system 

according to the invention, 
Fig. 3 shows schematically marker arrangements on a body of a person, 
Fig. 4 is a schematic view of a hand glove used as markers, 
Fig. 5 is an example of an application using the system of the invention, and 
Fig. 6 is another exemplary application using the system of the invention. 

The system 10 according to the invention comprises two main parts, a Server part 1 1 and GUI 
part 12. The server part 1 1 comprises a number of modules: Gesture Recognition 111, Gesture 
Library 1 12, Noise Filtering 113, Control 1 14 and GUI Drive 115. The GUI part 12 comprises 
GUI module 121. Moreover, the system interfaces a Camera input 13 and external drivers 14. 

Following is description of the various modules: 

Gesture Recognition 111 

This module takes an input stream of marker positions, assuming it traces the gesture to be 
recognized. This gesture is called a "research gesture". The main task of the module is to 
compare a research gesture to template gestures and to produce an estimation of similarity for 
every gesture in a library. 

Since the module provides low-level routines to accomplish the task of gesture recognition 
and it is designed to be fully controllable by the "Control Module" there is no application- 
specific code here. 
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The research gesture is provided inside the module from the input stream of markers, coming 
from "Noise Filtering" module. The "Control" module, which registers a callback function 
from "Gesture Recognition" to inform the "Noise Filtering" where to send the stream of 
markers to. 

5 

In the other hand "Gesture Recognition" module is populated by a large set of template 
gestures, coming from "Gesture Library". Once a research gesture is committed by the 
"Control" module to be stored in a "Gesture Library", the "Gesture Library" grabs the data 
from "Gesture Recognition" for storing in a database. 

10 

The recognition process can be divided into the following steps: 

Gesture fragmentation: In order to fragment the gesture, crucial points are looked for. These 
are points, e.g. where gestures make a sharp turn, change directions and so on. These crucial 
points form a skeleton of a gesture. 

15 

Coordinate transformations: Every gesture can be entered in various places in space, under 
various angles, and may be scaled as well. Thus, the gesture must be transformed to some 
normalized coordinate system. Preferably, but not exclusively, 4x4 transformation matrices 
are used to accomplish this goal. 

20 

An average value of similarity for skeletons is thus calculated. This value is the average 
distance between corresponding crucial points normalized by the length of the skeleton (the 
length of the gesture). This is the most important criterion of similarity for gestures. 

25 The next step is to compare gesture fragments. The fragments also have to be normalized so 
that start and end point of the comparing fragments can be located in the same position in the 
coordinate system. The algorithm of fragment comparison is similar to the calculation of the 
area between two fragments. The lower the value of area (normalized by the length of the 
fragment) is, the more resemblance is obtained. 

30 
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The last step is to combine the results of step 3 and results of step 4 for every fragment. 



Gesture Library 112 

The Gesture library module reads template gestures from a memory and populates the module 
5 "Gesture Recognition" with them. The library provides all functionality needed in order to 
handle a database of gestures. Moreover, the gestures have additional attributes, describing 
which user they belong to, what command (or a sequence of commands) they are attached to 
and in what modes they are operational in. 

10 This module interacts mostly with the "Gesture Recognition" module in order to populate it 
with a number of gestures from the database and the "Gesture Recognition" may also provide 
it with data for a new gesture committed to be saved into the database. 

In other hand, most attributes of the gestures are intended to be interpreted by the "Control" 
15 module. Thus, the "Control" module has access to read, modify and remove any properties of 
the gestures. 

One very specific use of the module is to provide the "GUI Driver" module with information 
about the appearance of the gesture. This is necessary to be able to illustrate the gesture on a 
20 display. 

The main requirement for this module is to be able to serve needs of various "Control" 
modules. In order to accomplish this goal, every gesture in addition to data, required by the 
"Gesture Recognition" module, may keep any number of key=value pairs. These slots are 
25 intended to be used in the "Control" system. Typical use of such slots include: 

name of the gesture, 

which user the gesture belong to, 

which mode the gesture is operational in, and 

which command(s) to the external device the gesture is bound to. 

30 
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Noise Filtering 113 

The main function of this module is to reduce the noise in marker data so that the "Gesture 
Recognition" will recognize the gestures without being confused by any high-frequency 
oscillation of the marker. 

5 

A so called median filter can be used, which means that the filtered output for a marker at 
time tj will be taken from the middle of a sorted array generated from the positions of markers 
in a range of times from t(- n to tj+ n . 

10 The common problem with any kind of filters is a delay introduced by the filter between a 

signal before and a signal after the filter. In a case of median filter the delay is n times. Using 
a camera providing a sample rate of for example 1 00Hz and assuming n=2, a delay of 20ms is 
obtained. The camera can be both for visible light and/or Infrared (IR) light. 

15 Control 114 

The tasks of this module are: 

system startup and initialization of all modules, 
user login and logout, 
20 - switching modes, 

macro expansion, 

issuing commands to the external device, 
user customization. 

25 The "Control" module controls all modules with exception for the "GUI" module, which can 
be operated by "GUI Driver". 

The Control module uses data stored in the "Gesture Library" for tasks such as: 
controlling the "External Device" 
30 - switching modes 
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operating the user interface and so on 
GUI Driver 115 

This module serves as an intermediate level between the Control and the GUI modules. This 
5 module prepares the data for "GUI" to be displayed without complex conversions, accepts the 
user commands and passes them back to the Control module. 

There are some specific routines, e.g.: 

Mouse emulation using "Camera Input" module. 
10 - Aids the "GUI" module preparing most data for displaying the gesture on the screen 

under various 3-dimensional transformation, possibly in animated way and 
transforming coordinate system on the fly. 

Keyboard emulation to allow user to enter the text data by "typing in the air". 

15 There are two very specific uses of this module: 

The first is to prepare data for mouse emulation using "Camera Input" module. In order to be 
able to accomplish this task, the stream of marker positions are directly sent to the "GUI 
Driver" module. 

20 The second is the ability to draw the gesture on the user's screen. In order to accomplish this 
task gesture's data is accessed. 

GUI (Graphic User Interface) 121 

This module drives the entire user interface and interacts with the rest of the system through 
25 "GUI Driver" module. 

Detector Input 13 

The Detector Input (camera input in case of cameras are used) refers to a system with few 
cameras located in various places in a working space being directed under different angles. 
30 The output from the cameras is sent to a system, which detects location of markers and traces 
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this location from one frame to another. Preferably, there are stream of 2-dimensional 
coordinates from every camera, which are combined together, forming a single stream of 3- 
dimensional coordinates for every registered marker. These are the data system receives as 
input. 

5 

There are special requirements on this module, including: 



20 External Device 14 

The External Device" refers to external systems, that the system of the invention is intended 
to control. One of the most important features of system is relative independency from the 
"External Device". 

25 The control of the external devices, protocols of data exchange etc., is assumed to be known 
to a skilled person not described further herein. 



10 



15 



Reliability; it is very likely, that the markers can be lost by tracing 
algorithms, because they may be covered by some objects in space so 
that they will not be seen from the cameras. This may require some 
extrapolation using the tendency of marker movements right before 
the marker was lost and information from the cameras, which still 
trace the marker may help to perform this extrapolation task. 
Noise filtering; the influence of noise filtering is described in section 
below. Arranging the noise filtering inside the "Camera Input" module 
simplifies the task of recovering lost markers, because additional 
delays and ability "to foresee" are obtained. The marker may get into 
the view filed of the camera again and the task of extrapolation turns 
into the task of interpolation for just few lost moments. 



30 



However, the system is adaptable to any external devices, which can be controlled by a set, 
e.g. of Boolean signals (ON/OFF) and some commands may have an arbitrary floating-point 
argument (it can have meaning of speed, altitude and so on). 
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The gesture recognition is carried out by positioning a number of markers on a human body or 
using parts of the body as markers. 

In the server 1 1 here denoted system 20 of recognition is described with reference to Fig. 2. 
The main task of the module is to work as a tracking system and to filter digitised information 
from each optical detection device (camera) and calculate 3D coordinates for markers. 

Tracking module consists of the following main components: 

Optical detection devices (cameras) 22, 
Frame grabbers 23, 
Set of (colour) markers 2 1 , 
A control system 24. 

Preferably, the markers (i =1 to m) are passive objects, which can be coloured. However, 
active markers can also be used. Each marker has its own unique colour (e.g. RGB 
combination). 

Marker's 21 positions are recorded by optical detection devices 22, such as cameras. The 
number of cameras: j = 1 to n (where n>=l). 

Preferably, SVHS video signals from optical detection devices are used and digitised by the 
frame grabbers 23. The frame grabber stores bitmap images from each optical detection 
device under each time frame. The process is synchronised. 

The control system 24 comprises a filtering module 241, a marker definition module 242, a 
hardware definition module 243, a 3D coordinate calculator 244, calibration module 245 and a 
database 246. The system filters the stored bitmap image data received from frame grabbers 
23 and prepares coordinates (X, Y) for all markers from all cameras in a time frame t. It 
means that in each moment of time t, the following data will be available from each optical 
detection device: 
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Based on the information received from the filter module 241, six (DOF: Degree Of Freedom) 
coordinates for markers are calculated by 3D coordinate calculation module 244. 
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10 

The system uses the calibration module 246 to define a zero point in space for 3D coordinate 
calculation. 
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Marker definition module 242 contains description of currently used markers. It should 
include a possibility to define multicolour markers too. 

5 Hardware definition module includes settings for optical detection devices, frame grabbers 
etc. 

It has the following set of tasks: 

■ To read and filter the digitised signal from all cameras (received from frame 

10 grabbers) - based on the information about the corresponding colour combination 

for each marker extract coordinate packages (t, X, Y) for each marker (i = 1 to m) 
from each camera (j = 1 to n) 

■ Calibration of optical detection devices 

■ To define markers (shape, size, colour combination, etc.) 

15 ■ Transform this data to co-ordinate packages (t, X, Y, Z) for each marker (i = 1 to m) 

in each moment of time, using the information of cameras calibration 

■ To save the set of colour combinations for system markers 

■ Provide a user with a possibility to define hardware settings 

■ Provide the user with responses (warnings, messages etc.) 

20 

The systems can use colored markers to keep track of the user and to serve for user gesture 
commands presentation. 

For example, two sets of markers will always be available in the system: Alfa and Beta markers, 
25 see Fig. 3. 

The Alfa markers 3 1 help the system to keep track the user 30. As the Alfa markers are worn on 
the shoulders of the user a horizontal line or plane could be calculated. This line/plane will 
indicate the lowest border of the user command area. To command the system, the user will have 
30 to raise hands above this line. 
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Beta markers 32 serve for the user gesture commands presentation. 

The number of markers depends on the application and tasks. For example there are two or more 
Alfa and three or more Beta markers in the system. For Virtual Reality applications, the number 
of Beta markers can be increased. 

In general, the markers are small, ID-specific, sequential color patterns. The single color fields of 
the patterns shall be large enough to be discovered by the cameras (within a reasonable distance) 
and small enough to enable the system to calculate shifts in small movements of the user. 

The most functional way of Beta markers presentation for the VR applications is to make gloves 
with color markers printed on their fingertips, see Figs. 4a and 4b, which illustrate a glove 
representation 43 from the back of the hand and corresponding marker representations 42 
generated as fingertips. It is possible to paint the body part as markers. 

> 

Each glove can have two base colors: for back and for palm of the hand. Every fingertip can be 
marked with its own unique color combination. 

For applications, which do not need such detailed information about the hand gestures, only two 
fingertips (thumb and index finger) could be used. 

To distinguish between the hands, the gloves have different base colors for right and left hands. 
For one hand the pair of original (for the palm of the hand) and inverted color (for the back of the 
hand) could be used. 

There are several ways of printing markers on the fingertips: the basic color of the glove is solid 
while the color of the fingertip acting as marker can be stripes or barcode, and the combination of 
the base color of the glove and the fingertip color identify the marker of the finger uniquely. 
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Both methods allow using only one color per finger to get four different markers for two 
fingers and twenty color combinations for ten fingers, using only five colors. 

The first markers interpretation way has a more complicated variation, the barcode view for 
5 markers. This technique gives a possibility to have the additional parameter for marker 
definition, the setting of the corresponding code. 

Barcode like and striped variants of markers presentation are useful only for high-resolution 
cameras. In this case there are more exacting requirements of distance restrictions too. 

10 

For standard camera configuration the second variant of markers presentation is acceptable. 

Markers definition module should supply the user of the system with the possibility to define 
currently used color markers. 

15 

The following information about currently used markers should be possible to define: 



❖ The Marker types (Alfa, Beta). For each marker type the following data should be 



described: 



20 



o 



Marker shape (circle, square, etc.) 



o 



Marker colour combination includes: 



• The description of color combination for Beta markers, situated on 
fingers is defined as following: 



25 



30 



The definition of base colors for gloves: right and left 
hands (back and palm of each hand). The color of the back 
of the hand is the inverted color of the palm of the hand 
The definition for color of each fingertip of the hand (so, 
using only 5 additional colors, 20 pseudo multicolor Beta 
markers, situated on the user fingertips are obtained). 



WO 01/69365 PCT/SE01/00528 

14 

The description of color combination for possible 
remaining Beta markers and Alfa markers. 
Color properties definition. 

5 It is very important to choose the combination of adjacent colors in a way to prohibit mistakes 
in the recognition process. 

The combination of base color of glove and a color of the fingertip gives a possibility to 
define pseudo multicolor markers on the hand fingers. The system allows possibility to define 
10 multicolor, striped and barcode markers (in case of high resolution cameras using). 

The effects of marker material, interior lighting and shadows to the detection of markers 
should be minimized. In the 'Color properties definition' section a logical scheme of 
"distances" between colors can be established. 

15 

The 'Marker Recognition Module' (further called MRM module) should recognize (t,X,Y) 
packages for all markers from all cameras in each moment of time t based on the information 
from 'Markers Definition Module' (MDM) and digitised data received from optical detection 
devices. 

20 

The two types of markers recognition are realized: 

o Beta markers recognition (recognition of the center of the of the fingertip color 
shape, taking into account the base color of the glove, i.e. the center of color spot 
rounded by base color of the glove) 

25 o Alfa and remaining Beta markers (if exist) recognition (recognition the center of the 

solid color markers, situated outside the gloves) 



Requirements for the Markers Recognition Module (MRM): 
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The module is designed in a way to make possible to add possibilities to recognize 
future multicolor, striped and barcode markers (in case of high resolution cameras 
using) 
Reliability 
5 - Speed of recognition 

Scalability. The MRM module should scalable to have a possibility to be easily 
used in other applications. 

Compatibility. The development tool should allow working with different computer 
platforms. The MRM module should be compatible for working with different 
10 types of frame grabbers. 



To be able to position the markers, methods are used transform the positions into two or three- 
dimensional coordinates. 

15 The task of solving transformation matrix requires the definition how four points should be 
transformed in space. All these four points should not lie on a single plane. In fact for the sake 
of precision the best choice would be to have four points, so that they form tetrahedron will all 
vertices with the same length. 

20 However, if there are only two points, a third one must be found first. In the following a 
method is described for finding a third point by two given points, so that three points will 
form a triangle in space with all sides being equal. 

Assuming that there are three points P l5 P 2 and P 3 each defined as P,(x„ y„ z,), P 2 (x 2 , y 2 , z^ 
25 and P 3 (x 3 , y 3 , z 3 ) then a fourth point P 4 (x 4 , y 4 , z 4 ), so that a triangle P, P 2 P 4 having equal sides 
with all four points in one plane is sought. 



The general equation for a plane is the following: 
Ax + By + Cz + D = 0 



0) 
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Another important formula is: 

x cos a x + y cos a y + z cos a 2 - p = 0 (2) 
Note, that a x , cos a y and cos a z define cosiness for a vector perpendicular to the plane. 
Also, three points can define the plane: 



10 
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Moreover, 
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15 
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(4) 



(5) 



which in combination with formula 4 and normal matrix calculation gives A, B, C, cos a x , cos 
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a y and cos a 2 , and finally a vector can be built forming the middle of the triangle P,P 2 P 3 to the 
point P 4 . 

For finding a point p on a line ab so that |pc| = min, where c is an arbitrary point in space and 
a and b are distinct points, a perpendicular to the line ab from point c is found. A point where 
this perpendicular crosses the line ab is a point we are looking for. 

Having three points a(a x , a^ aj, b(b x , b y , b z ), c(c x , c y , c 2 ) and point p(x, y, z), so that line ab is 
perpendicular to pc and point p lies on the line ab. 

From vector algebra, it is known that vector product for perpendicular vectors is 0: 
a . b = |a||b| cosy (6) 

Expressed in Cartesian coordinates for 3d coordinate system it appears to be: 
a.b = a x b x +a y b y +a 2 b z (7) 

The line equation in Cartesian coordinates, assuming that two distinct points are 
Pi (x„ y„ Zj) and p 2 (x 2 , y 2 , zj, a line is defined: 

= Yzli = z ~ z i (8) 

x 2~ x i yz~y\ z i~ z \ 

Thus, the system of three equations for three variables are obtained: 
ab.pc = 0 
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Assuming vector v(v x , v y , v 2 ) = ab= v(bx -ax, by- ay 9 bz -az) 
Then the following parameters for p are obtained: 



V, 



= b-a„ 



(c, v x + v y + c z v z - a y v y -a 2 v 2 ) + ax(v y + v 2 ) 

.X r r 2 

V + V + V 

j> = (x-a x )— + 



z=(x-a jr )^- + a, 



(10) 



5 v 2 x + + = / 2 , where / is the distance between the points a and b. 

The task of solving transformation matrix requires the definition how four points should be 
transformed in space. All these four points should not lie on a single plane. In fact for the sake 
of precision the best choice would be to have four points, so that they form tetrahedron will all 
10 vertices with the same length. 

In this particular case we know only how two points should be translated and we also know 
one more point. However, third point in a source and destination coordinate system is not 
identical, and this third point can be used only to define a plane, where real third point can be 
15 located. 

Thus, if there are two points Pj(a x , aj and P 2 (b x , b y , bj a third point P 3 (c x , c y , c z ) is sought 
so that a triangle P,P 2 P 3 with equal sides is obtained. 



20 If P, and P 2 build a vector P X P 2 , following is valid: 
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cosa y = 





X 2 








-x l ) 2 +(y 2 

y 2 


-yd 2 ^2 
-y, 






Z 2 ■ 


-v,) 2 +(z 2 




J(*2 




-v,) 2 +(z 2 





(11) 



cos 2 a x + cos 2 a v + cos 2 a z — 1 



cosa r cosa H-coscr, cosa + cosa. cosa =cos60 
cos 2 a + cos 2 a + cos 2 a' =cos90 

jr y * 

which for cos 90 = 1 and cos 60 = 0.5 gives 

2cosa r cosa + 2cosa v cosa' +2cosQr ? coscr' =1 
cos 2 a + cos 2 a +cos 2 orl =1 

j y z 

and for 

k x = 2 cos a x 
k 2 — 2 cos a 
K 2 = 2 cos a, 
x y = cosa 

1 X 

x , = cos a 

X 

x, ~ cosa 
and assuming that x 3 equals 0, the solution is 

(k) + kf )x) - 2k) x j + 1 - kf = 0 (15) 

D = 4k) + 4(kf + k) )(1 - kf ) (1 6) 
which gives 
2k, ±45 

X i=— J i — r~ ( 17 > 
7 2(* y a +**) 



(12) 



(13) 



(14) 



For transformation of given set of four points Pi(x„ y„ z,), P 2 (x 2 , y 2 , Zj) , P 3 (x 3 , y 3 , z 3 ) and 
P 4(x 4 , y 4 , z 4 ) to a given set of another four points P',(x',, y'„ z',), P' 2 (x' 2 , y' 2 , z' 2 ) , P' 3 (x' 3 , 
z' 3 ) and P\(x' 4 , y* 4 , z' 4 ) a transformation matrix is used. Thus, if a matrix A is to be 
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transformed to a matrix A', a transformation matrix T is used 



A' = 



AT 



The matrix multiplication gives 



5 C = AB = [ a(i ][b Jk ]=[c ik ] 



(19) 



wherein 



n 



(20) 



This is used for transforming points in the coordinate systems of the system according to the 
invention. 

10 The system of the invention can be used for several applications. One application is illustrated 
in Fig. 5, in which the system is used as a virtual pointer device or mouse (VM). A user 50 is 
provided with a number of markers 5 1 and 52 (Alfa and Beta markers). A number of cameras 
53 are arranged so that the user is within the field of view. The cameras are connected to a 
computer unit 54, which runs the system according to the invention. Thus, the movements 

15 (gestures) of the user are translated to commands, which affect the movements of, e.g. a 

pointer on the screen. This application is especially useful, for example when lecturing and 
using computers for showing images (overheads) etc., or controlling the computers. In this 
case, the system comprises additional functions for translating the gestures to pointer 
commands and adopting the movement of the markers (translating coordinates) to movements 

20 on the screen. 

Another application is illustrated in Fig. 6. In this case, the user 60 controls a crane 65. 
Cameras 63 are provided within the working area, preferably on the crane. The gestures of the 
user are then translated to commands for controlling the crane, such as moving 
25 forward/backwards, lifting etc. Other advantages of applying the invention in this case is that 
the objects in the way of the crane can be provided with markers so that the crane can avoid 
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the marked places, or workers provided with markers, e.g. on their helmet, thus achieving a 
secure working environment. 

Above-mentioned applications are, however, given merely as examples and do not limit the 
invention to such applications. 

The function of the system is to detect the movements of a marker and relating the movement 
to a stored movement representation, which is translated to a command. To simplify the 
translation or to increase the command possibilities, the system uses alfa markers as a 
reference line and movements are translated or omitted with respect to the reference line, e.g. 
depending on the movement taking place above the line or beneath the line. Fig. 7 illustrates 
schematic translation of the marker movements. The left hand illustrates the translation 
results. In fig. 7a the movement of marker 72 is above the reference line 76 comprising the 
distance between the alfa markers 71, and it is translated to an "M" Also, in fig. 7b, the 
movement of marker 72 is above the reference line 76 and it is translated to an "O". In Fig. 7c 
the movement is under the reference line and it is omitted. The movements can also be 
translated to commands such as "Move", "Click" etc. 

It is also possible to define some macro commands, i.e. a command that comprises of several 
commands and correspond to one gesture. In case of the crane application one gesture can 
mean to move to specific position and lift an object. The command sequence is recorded and 
related to the gesture. 

It is also possible to use different hand gestures as different commands, e.g. using left hand 
movement for one command and right hand movement to another. 

Moreover, the markers can also be arranged on other moving objects such as robots, etc. 

In above examples optical markers and detection devices are exemplified. However, it is also 
possible to use radio or ultrasonic markers and detection devices. 
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The principle of the position detection in case of ultrasonic/radio is based on the registration 
of the time difference between the time when ultrasonic sound or radio signal is reaching 
checkpoints (microphones/transceivers). 

Fig. 8 shows the scheme of the position detection. The marker 81 in the centre of the room 
produce ultrasonic signal, which can be registered by four microphones 83a-83d (for 2D 
position detection is enough to have only two microphones, four can be used to determine 
better signal quality). Since the distance from marker to each of the microphones is different, 
they will register the signal in different moments of time. 

The speed of sound wave propagation is above 330 m/s, which allows registration of that 
difference by devices based on the ordinary electronic components. If the variation in distance 
is 50 cm, the time difference will be approximately 0.5/330= 0.00151 s. 

The limitation of the accuracy of that system is: 

■ The accuracy of the time measurement 

■ The deviation of the speed of the sound propagation (may be caused by humidity, 
temperature) 

Possible obstacles are: 

• Interference 

• Sound reflections 

• Ultrasonic emission of other electronic devices (such as computer monitors, floppy and hard 
disks, pulse power supplies, etc.) 

In order to use many markers in one room with the possibility to detect each of them, it is 
suggested to use noise-like signal with the specific spectrum for each marker. This will allow 
also eliminate the disturbances caused by other sources of the ultrasonic emission. 
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Inertial sensor can detect the value of the acceleration of the object by each of three axes. That 
information can be used for determining the position of the object. The accuracy of the inertial 
sensors can be above 0.5%. The disadvantage of this method is that the analyzing unit will 
accumulate the detection mistake and require periodical calibration. 

5 

Taking into account the wide range of the requirements it seems logical to attempt create the 
system based on the low number of the module types, which combination will form the 
necessary configuration for the certain customer. 

10 From the technological point of view the best solution is producing very few types of modules 
with adjustable characteristics. 

Thus, the system of the invention can be used in a small room with the high requirements for 
the accuracy and lag time, but with low number of markers (for gesture recognition). Or, it 
15 can be used in the large rooms and corridors like in the hotel for guest or security guard 

position detection. That wide range of applications can be achieved by combination of several 
position detection methods and centralized processing of the position data. 

Ultrasonic (radio) sensors can register presence of the marker in some certain room and its 
20 position there. 

Inertial sensor can deliver information necessary to build a trajectory of the marker from the 
last calibration point, and therefore it's current position. Inertial markers are not associated 
with some certain room, only with the calibration points. For each marker it is necessary to 

25 evaluate and keep in memory the position, speed, direction, and acceleration. Even if 

ultrasonic sensors perform detection, that information will be required for inertial sensor 
calibration. Trajectory also can be stored, at least for the inertial sensors. Interesting feature 
may be the automatic testing of the inertial sensors. In case if marker situated in the room 
equipped by ultrasonic receivers, it is interesting to compare the difference in the position 

30 detection by both sensor types. It may be used for evaluation of the inaccuracy and calculation 
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the necessary corrections for that certain inertial sensor. 
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Information delivered to the custom software can consist of 
• The room number and position in this room 
5 • The position of the last calibration and relative coordinates from this point 

There are two possible solutions for building ultrasonic system. One is using transponders in 
the markers and fixed microphones on the walls or ceiling. The second way is using 
microphones in the markers and fixed transponders mounted in the rooms. 

10 

Another side of the ultrasonic detection principle is the sequence of signal transmission. One 
approach is to fix time-of-fly of one signal to different points (sensors) in the room, and, the 
second is transmitting several signals, in order to measure each distance separately. 

15 The example of the realization of the second way could look like the following: 

There are several transponders, mounted on the walls. Each of them transmits specific signal 
in the specified order. Transmitted signal contains the time-code, and sensor code. Marker 
with the microphone receives the signals from each of the transponders, and, after decoding 
the time code, transponder's code, and comparing the fly time, it can determine its position. 

20 

The problem of that design is that for fast moving objects the position detection accuracy will 
be much lower, since signals will be received in different points. 

In case of fixed microphones, ultrasonic (US) transponders in the marker, the US marker can 
25 consist of power supply, CPU (one-chip PIC) for signal generation and US transponder. There 
should also be some module, which will generate the command for US signal transferring. It 
is necessary because many markers are present, working simultaneously in the same room 
with the same receivers. It can be done by receiving command by radio channel, or by 
synchronizing the time counter of all markers. 

30 
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Both ways have their specialty. Using the radio channel will limit the number of markers 
working simultaneously in the system, since the bandwidth of that (standard, 433MHz) 
channel is only 40KBit/sek. The time synchronization will also have possible problems with 
clock deviation, and the attempt to eliminate this effect by increasing the gaps between the 
marker signals will also limit the number of markers, or the lag time. 

Microphone units have microphone amplifier, signal decoding processor, and interface unit. 
Each microphone unit has internal clock synchronized with other microphone units. When 
microphone unit detects marker's signal, it sends to the central unit the marker ID and the 
registration time. Central unit based on the information received from the different 
microphone units evaluates the position of the marker and sends it to the Position server. 

Marker is built of the command block, power supply, and US transponder. Using US 
transponders in the marker may be difficult due to high power consumption by these devices. 
However, marker in this case is very simple device, with almost no logic and low number of 
components inside. Simple working logic will decrease the cost of the software development, 
since the most of the logic can be programmed using the high level programming languages. 
The problem in this case is marker's synchronization and no possibility to use US detection 
principle together with the inertial method, because for the calibration of the inertial sensor it 
is necessary to have position information inside the marker. 

In case if marker receives and decodes the US signal, microphone, amplifier and decoding 
module should be built into the marker. Marker in this case processes all information and can 
determine its position. Afterwards this information will be sent to the location detector. 
The advantage of that design is lower use of the radio channel and possibility to combine US 
and inertial sensors. 

Inertial sensors can register the value of the acceleration and, therefore evaluate the relative 
coordinates from the starting point. In order to get more or less precise position information, 
the acceleration value should be measured quite frequently 1000 or more times per second. 
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Due to requirement of many markers working simultaneously, measured values cannot be 
transferred to the location module with that speed, which means that this information should 
be processed inside the marker. 



5 The information that markers send to the location module should in this case include 

• Marker ID 

• The ID of the last calibration point 

• Relative position to the last calibration point 

10 The calibration point of the marker is the place where user receives marker from the 

administrator. In case if marker is equipped also by US module, It will calibrate the position 
each time user passes the US calibration point: doors, corridors, bar counter, etc. Calibration 
points should be placed taking into account the flow of the visitors/users. For example in the 
nightclub on the ferry, guests will frequently visit bar, toilet, pass doors. That makes possible 

15 to equip by USS calibration devices only that places, and follows marker movement on the 
rest of area by radio signals from inertial sensor. 

For using markers as input devices for gesture recognition, there are some peculiarities 
considered. 

20 

The update rate of the marker position is much higher than for tracking. For gesture 
recognition more important is relative movement, not the exact position of the marker, which 
makes inertial markers more suitable for this purpose. 

25 In case if the same hardware device is used as tracking system and as input device for the 
gesture recognition, there could be some special switch, which will allow the device to 
operate in different modes: gesture recognition and tracking. Switching to the gesture mode 
will increase the position update rate and will make possible the gesture recognition. Another 
way is to recognize simple gestures inside the marker and send to the position server the result 

30 of recognition also. That way will make possible using this device for tracking and control 
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needs. 

For example the guard checking the building rooms can use the marker for controlling the 
light, cameras, giving the (alarm) signals to other guards. Inertial markers can also be used for 
detection of dangerous situation, like if passenger falls out of the ferry, or if guard started to 
fight with some intruder. In all this cases there will be very special movements, which can be 
analyzed and corresponding actions can be preceded automatically. 

Thus, the same modules as for optical detection can be used as for ultrasonic or radio markers. 
The same positioning, finding and coordinate translating methods as described above can be 
applied on the incoming signals. 

The invention is not limited to the shown embodiments but can be varied in a number of ways 
without departing from the scope of the appended claims and the arrangement and the method 
can be implemented in various ways depending on application, functional units, needs and 
requirements etc. Thus, a combination of the detector devices (radio, sound, light) may also 
occur in same application. 
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CLAIMS 

1 . A gesture recognition system comprising: 

means for detecting and generating a signal corresponding a number of markers 
arranged on a movable object, 

means for processing said signal from said detecting means, 
means for detecting position of said markers in said signal, 
characterised in 

that said markers are divided into first and second set of markers, said first set of markers 
constituting a reference position and that said system comprises means for detecting 
movement of said second set of markers and generating a signal as a valid movement with 
respect to said reference position. 

2. The system of claim 1, 
characterised in 

that said detecting means is at least one camera and said markers are optically detectable 
markers. 

3. The system of claim 1 or 2, 
characterised in 

that said detecting means is microphones and said markers are ultrasound generating markers. 

4. The system according to any of claims 1-3, 
characterised in 

that said detecting means is radio receivers and said markers are radio transponders. 

5. The system of any of preceding claims, 
characterised in 

that the system comprises a gesture database. 
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6. The system of claim 5, 
characterised in 

that the system comprises Gesture Recognition module, which takes an input stream of 
marker positions and compares a research gesture to a template gestures and produces an 
5 estimation of similarity for every gesture in the database. 

7. The system according to any of claims 1-6, 
characterised in 

that the gesture recognition is accomplished by Gesture fragmentation, whereby gesture 
10 crucial points are looked for. 

8. The system according to any of claims 1-7, 
characterised in 

that the gesture recognition is accomplished ccoordinate transformations, where every gesture 
15 is entered in various places in space, under various angles, and the gesture is transformed to 
some normalized coordinate system. 

9. The system according to any of preceding claims, 
characterised in 

20 that it comprises a Noise Filtering, which reduces the noise in marker data signal from 
detection means. 

10. The system of claim 9, 
characterised in 

25 that said filter is a median filter, in which a filtered output for a marker at a time {t^ is taken 
from the middle of a sorted array build from the positions of markers in a range of times from 
H-n to ti+ n . 



11. The system according to any of claims 1-10, 
30 characterised in 
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that it comprises a GUI Driver, which serves as an intermediate level between a Control unit 
and a GUI module. 

12. The system according to any of preceding claims, 
characterised in 

that said object is body of or parts of body of a person. 

13. A method of gesture recognition, in a system comprising: 

means for detecting and generating a signal corresponding a number of markers 
arranged on a moving object, 

means for processing said signal from said detecting means, 
means for detecting position of said markers in said signal, 
characterised by 

dividing said markers into first and second set of markers, said first set of markers constituting 
a reference position and that said system comprises means for detecting movement of said 
second set of markers and generating a signal as a valid movement with respect to said 
reference position. 
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